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ABSTRACT 

Confronted with the problem of determining the 
frequency of syntactical patterns in present-day written Australian 
English, the author employs a method of analysis which produces an 
output in the form of a two-dimensional line diagram showing all the 
syntagms comprising the sentence under analysis. For the remaining 
problem of sorting the diagrams into divisions and sub-divisions of 
syntcgmt, the author advocates the use of a method of linearization 
used for sorting structural diagrams of chemical compounds. A 
description of the methodology is provided along with an explanation 
of its adaptation to Janguage analysis. (VM) 
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In: Linguistic CommunlcaclGns ; 3, 

THE DLTEPMIiMATION OF THc FREulUIrJCY OF SYNTACTICAL PATTERNS 
IN PRESENT-DAY WRITTEN AUi'TRALL^N ENGLISH. Report uatbd 
15th May, 1970. 

Ralph D. Beebti 
Mona^h Univ^srh Lxy 

In advibxML^ th'j writer cn -:hxs project, Profsssor 
G. E. HammarstrLii hau suggesteu that the frequency of 
English syntagms cc d bt- JoXiBrni .du py examining c 
corpus of English rMtc;n-e^. uivldmg then tirsi. into 
sentence types, thr n sulj- Qivi.-uing zo:i sentence types 
further, according to his system of syntactic terminology 
(Hammarstrttm 1967). A manual sorting of :.entences in 
that way would have been n procecc of great magnituue. 

In searching fo:: a mere elegnnt method, the vi-riter 
first aimed at a crvnputinn prcjra.r .^hich would have 
automatically arialysed o.-nt^ ncucS in^^ Li.'^.rir jyntagms. 

hoped to be b'dIh chjn ..p-'ify the p-'-g^ram .o sort 
the sentences and thrir 5yn-:-rn.,)i. Altirjugli such an 
iinalysis program hru l-czi.n devd:iPoed by Tratley, Thorne 
ond Dewar (1967), il pcov^-i Lo be incapa'ile of being run 
on any computer if) Auot:.al-.a Cu2 tc computer-language 
incompatibilitins. :.ltPrnr-\ j.vi program 

(Sager 1967) evulv^u ot I ' J: Unive-r.-ity did not 

provide an output in ndeq-tate »*n ,n f..r t\v.: purposes of the 
pj'ojuct. No other proqr-arr.j vere uj-r: ntly available. 

As a manual andiAyL,i : r,nnnT=3a J.e.i jforn, incivitable, 
the writer turniiu h5s attention to ocher Jarge-scale manual 
analysis work uone prev/inu :ly. A fruitful ared appeared 
to bd in studies of the writing of cliildren. Notable 
examples were thons of Ln L.;-nt M';33), Strickland (1962), 
Loban (1963), and Hunt Ove^). These c.tudies showed a 



growing tendency to warus a more formal .elineation of 
sentence structure, but all in.icateo that a more complete 
study could not be made untxl some more detaxlad system of 
analysis had been devised. 

The writer then turned his attention to using a n.athoo 
of analysis which he h.. hi.,solf uevelcpeo primarily for 
teaching purposes. This methoo g..e .n output in the fox. 
□f a two-dimensi.nal line uiagra.n shoeing all the synt.gms 
comprising the sentence analysea. It was essentially a 
surface-structure analysis using . f^m of uependency grammar. 

The problem still remained, however, of how to sort such 
diagrams into divisions anu ".ub-divisions of syntagms. 

The writer h.d observed chat a somewhat similar Problem 
of sorting chemical compounds expresseu in the form of 
molecular-structure diagrams hau been .olved in various 
ways throughout the worlu. He selected one way devised 
by the U.5. Army Biological Laboratories (Wiswesser 1954) 
anu currently popular with many U.S. orug companies. 

The selectau method first reouced the two-dimensional 

uiagrams of molecular strurt. t-o i • ^ ■ 

uxar structure to linear strings of symbols, 

and then sorted the strings conventional computer 
methods . 



From the principles employed by Wiswesser. the writer 
succeeded in learning how to linearize his own two-dimensional 
diagrams of sentence structure, and th. remainder of th. 
project can now be completed by writing a suitable computer 
program for sorting the linear strings of symbols. 



Further aid may be obtained in this phase of the project 
by stuuies of the programs useu in organic chemistry and 
of new languages for the computer such as PL/1 anu SNDBaL 
devisee especially for sorting strings of symbols. 
Compatibility with the Monash University computer complex 
will be an overriding cons iueration . 

A st.3tistical analysis of the results will determine 
the required syntagm frequencies, and the syntagms might 
then be allottea hierarchical distinctions using 
Hammarstrtim 's proposed terminology. 

By examining several different genres of present-aay 
written Australian English, the syntagm frequencies among 
the genres can be compared, thus" reuucmg the influence of 
errors in the syntactical analysis. 

BRIEF DESCRIPTION OF THE WI5WE55ER SYSTEM 

The methou of linearization used for sorting structural 
diagrams of chemical compounds in the Uniteu States, 
aeviseu by Wiswesser (1954), and reviseu by Smith (1968), 
first translates all conventional two-letter atomic symbols 
into single letters, and also proviu^s single-letter 
identification symbols for groups of atoms forming commonly- 
occurring radicals. for example the halogens, bromine and 
chlorine, normally expressed by the symbols 3r ana CI, 
become E and G, so that the follownng list emerges: 

E bromine atom 

F fluorir.^ atom 

G chlorine atom 

H hydrogen atom (although H is mostly unexpressed) 

I ioaine atom 



4. 



Addeu to the list are the following symbols for 
various groups: 

U hydroxy! group, -OH. 

V carbonyl connective. ^ 

' -C- 

(carbon connected to thr^e other atoms) 
W nonlinear (branching) uioxo group as in 

-IMO^, -50.«. |^/ot useo for linear (unbrancheu) 
structures such as CO^i SiO^, MO^ , 50^ . 



. . H 
immo group , J 



Z amino group -fvlH^ . 

Numerals are used to show the number of carbon atoms 
in unbranched alkyl chains or segments. 



Thus the following unbranchea compounds are expressed 
in linear notation as shown: 



0 

(1) CH^XcH^ 

(2) CH CH 

3 2 2 3 

(3) HO-CH^CH^-OK 

(4) O^IM-CH^-O-CH^-NO^ 



(5) 



IVl 

202 
Q2Q 

V^i'lOlilW 
Z3Z 
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For branched compounds, a graphic formula is first 
interposed between the structural formula ana the eventual 
linearization, rules being laiu uown for linearizing the 
graphic formula. In the following simplifieu description, 
these rules are abbreviate, to the point of inadeouacy, but 
they serve to demonstrate the basis for the eventual set of 
rules devised by the writer for his sentence diagrams. 



Thus observe the foliowiriij linearizations: 
Structural Formula Graphic F-^-'-.uia Linear Udti on 

UH n 



O-CH3 



. CH CH 

CH3CH2-BC; 2 B 2 2B2&2 



The rules staia that the linearization of a graphic 
formula is performed by citing the symbols along a main 
chain until a branching point i3 reached, digressing along 
the branch, then returning, after ^.hr; end of the branch is 
reached, to the main chain, insc^rting an extra symbol (&) 
before resuming tne symbolG ,jf zhc nciin chain. If the 
branch terminateG in c siT,';ol ,..hich -cjnnjt be followeu in 
any case along that brar.ch by ntlicr symbols, then it is 
a 'termindtjng' symbol, ana there is no noed to insert the 
resumption symbol (&) when continuing along the main chain. 

In the first exampl.; above, u :.s a 'urrminating' symbol 
known to be such by nn orgrnic chemist, so there is no 
newd to use the resumption .symbol when cont\nuinij along 
the main cha^n after deolinq ,-ith th.. branch chain. In 
the second example, however, t'.o ■••rjic'-i. , - .b.-il s .-re not 
'terminating' symbols, as they can eoch be folLoivsa along 
their branches by other cynibols, .i ifor-nation which again is 
known by the organic chemist who en^o^Jes the diagram. 



Thus the inherent technical knowledge of the encoaer 
enables him to encode correctly. 

The Wiswesser system covers not only unbrancheo 
anu branchcu chains, but also cyclic compounas, utilizing 
in all some 250 rules. In the encoding of sentence 
diagrams, however, only a few of the rules of the 
Wiswesser system arc needed. The-s selected rules have 
been drastically simplified in tne brie, description given 
above. Their application to sentence^diagram encoding 
will now be uescribed in detail. 

APPLICATION OF THE WISWE55ER SYSTEM TO SENTENCE DIAGRAMS 

The appendix gives some examples of the encoding 
of sentence diagrams. The four basic types of English 
sentences, di stinguisheu by their verb types, are encodec 
as follows: 



( 1 ) John shudderea 

(2) John injured Jim 

(3) John was sick 

(4) They elected John 
captain 



i\l+D 

N+D+N 

N+B+Ui 

R+FN+i\( 



The D in the graphic formula of sentence (4) above has 
been omitted from the linearization. This has been done 
because D is an essential element of a factitive preaicator 
F and can therefore be assumed to 'je present without being 
specifically mentioned. Its omission is similar to the 
omission of the hydrogen symbol from the alkyl group in the 
Wiswesser system. 



ERIC 



7. 



A similar omission or the symbol for the preposition 
can be maoe in every prepositional phrase since every such 
phrase must commence with a preposition. It is only 
necessary to insert the symbol H for the phrase ana go 
straight on to consider the other elements apart from the 
preposition. The normal element accompanying the 
preposition in the phrase is the noun, but that element 
can be replaced by various substitutes such as the pronoun, 
or non-finite verb. If the noun is present, it can be 
omitted from the linearization; ©nly the symbol for its 
substitute need be includea when such a substitute is 
present. On the other hand, any dependencies of the noun 
must be shown, as in sentences (5) and (6). 

(5) John struck Jim in anger N+DH+N 

(6) John struck Jim in great 

anger I\I+DHQ+N 

There can be no ambiguity concerning the Q in 
sentence (6) since an aJ-^ective cy^^not be used to 
describe a preposition. The Q must Le a dependency 
of the N in the phrase H. 



This is an example of the inherent technical knowledge 
of the encoder enabling him to encode correctly, a parallel 
operation to that of the organic chemist encoding chemical 
compounds ^y the Wiswesser system. 

The advantages of the linearization system become more 
Bvioent when niore cor plicated sentences are considered. Sue 
Appendix, sentences (7^ ano (8). 

It is clear that the sorting ot the strings is, comparatively 
speaking, the least problerr.atical part of the project. 

ERIC 
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APPENDIX 





r Un 


o 1 KUL 1 UKAL UlAbKArl 


Adj 


- 


Adjective 


AcJv 


- 


Adverb 


AG 


- 


Appositive Group 


C 


- 


Conjunction 


CG 




Ccordinate Group 


CI 


- 


Clause 


Comp 


- 


Complement 


0 


- 


Degree 


Exp 


- 


Non- finite Expression 


F 


- 


Frequency 


FV 


- 


Finite Verb 


Fac Pred 




Factitive Predicator 


M 




Manner 


N 




Noun 


Neg 




(legation 


NFV 




Non-finite Verb 


0 




Object 


P 




Place 


Phr 




Phrase 


Pn 




Pronoun 


Prep 




h reposition 


5 




Subject 


Sup 




Supplement 


T 




Time 



10, 



2. SYllBOL CODE FOR GRAPHIC FOPuYJLA ^.ND LIliEARIZATI OK 



A 


Autiositive 


B 


Eeing; verb 


c 


Coordinator 




Doing verb 


£ 


past participlE 


P 


Factitive predicator 


G 


inG verb-form 


H 


prepositional pHrase 


I 


Intensif ier 


J 


rejector 


K 


infinitive 


L 


cLaase 


M 


Modifier 


N 


Noun 


0 


cCtepound verb 


P 


Preposition 


Q 


Qualifier 


R 


pRonoun 


T 


deTerminer 


U 


sUbordinator 


V 


passiVe verb-form 


W 


navin^, costing^ or Weighing verb 


X 


non-finite expression 


y 


- numeral itY 


z 


possessive 


St 


return to main chain 


+ 


governing relationship 



FRir 



1 ""^ 



3. EXAMPLES OP THE ENCODING OF SENTENCE DIAGRAMS 

(l) Sentence: John shuddered 
Structural Diagrams 

S(n) FV 



John) 
Graphic Formulas 

N — 



(shuddered) 



Linearizations N+D 



11, 



(2) Sentences Johr. injured Jim 
Structural Disgrams 

S(N) PV 



(John) (injured) 
Graphic Formulas 

N D — : 

Linearizations N+D+N 



- 0(N) 
(Jim) 



(3) Sentences John was sick 
Structural Diagrams 



S(N)- 



(John) 
Graphic Formulas 

N B 

Linearizations N+B+Q 



FV — 
(was) 



Q 



Ccmp ^Adj) 
(sick) 
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(4) Sentence: They elected John captain 
Structural Diagram: 



S(Pn)- 
(They) 



Fa- ■^red 



FV + Comp(N) 

(elected) (captain) 



Graphic Formula: 
R 



D + N (■ 



Linearization: R+FN+N 



- 0(N) 
(John) 



12. 



(5) Sentences John struck Jim in anger 

Structural Diagrams 

S(K) PV 0(N) 

(John) (struck) (jim) 



Graphic Fomulas 
1^ 



Adv Phr(M) 



Prep 
(in) 



+ 0(N) 
(anger) 



H 



+ K 



Linearization: N+DH+N 



(6) Sentence: John struck Jim in great anger. 
Structural Diagram: 

3(N) FV 0(fij 

(John) (strucK) (jim) 
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Adv Phr(M) 



Prep + 0(K) 
(in) (conger) 



Adj 
(great) 



Graphic Formulas 
N D 




Linearization: N+BHQ+N 



(7) Sentences The boy from Melbourne kicked the ball 
into the net. 
Structural Diagram? 



S(N) 
(boy) 



FV - ' 
( kicked ) 



- D(iN) 
(ball) 



Adj(Uet) 
(The) 



\ 

\ 

Adj Phr 



Prep 
( from) 



- 0(N) 
(Melbourne) 



Adv Phr (P) 



Auj(Det) 
(the) 



Prep 
( into) 



0(N) 
(net) 



Graphic Formula: 
N 



H 



H 



T 



Adj (Det) 
(the) 



Linearization: NTH+DHT+NT 
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(8) Sentence: The Governor-General's opportunities for 
independent judgement on constitutional issues are 
severely limited. 
Structural Diagrams 



S(N) 



(opportiinities) 




Adj(Det) Adj(Foss) 
(The) (Governor- 
General' s) 



Prep 
{for) 



- FV — 
(are) 



Adj Phr 



0(N) 
(judgement) 




Comp(Adj) (IJFV) 
(limited) 



Adv(D) 
(severely) 



Adj 

(independent) 



Adj Phr 



Prep + 0(N) 
(on; (is^es) 



Adj 

(constitutional) 



Graphic Formula: 




Linearizations OTZ&HQ&HQ-f-B+EI 



